Search Results: "mpalmer"

25 February 2013

Matthew Palmer: libvirt errors that are not helpful

Since there is absolutely zero Google juice on this problem, here s some hints in case someone else is out there beating their heads on their keyboard in frustration. The problem: when trying to define a storage pool (or 90+% of other virsh commands), you get this sort of result:
# virsh pool-define /tmp/pooldef
error: Failed to define pool from /tmp/pooldef
error: this function is not supported by the connection driver: virStoragePoolDefineXML
Or this:
# virsh pool-create /tmp/pooldef
error: Failed to create pool from /tmp/pooldef
error: this function is not supported by the connection driver: virStoragePoolCreateXML
Not helpful at all. The problem is (or, at least it was for me) that I have both KVM and virtualbox installed (I prefer KVM, but vagrant uses virtualbox and I m playing around with it). It would appear that libvirt is preferring to use virtualbox over KVM, which is stupid because virtualbox doesn t appear to be fully supported (as evidenced by the extensive set of functions that are not supported by the virtualbox connection driver). The solution: edit /etc/libvirt/libvirt.conf, and ensure that the following line is defined:
uri_default = "qemu:///system"
This will tell libvirt to use KVM (via qemu) rather than virtualbox, and you can play with pools to your hearts content.

22 January 2013

Matthew Palmer: When is a guess not a guess?

when it s a prediction . In the 4th January edition of the Guardian Weekly , the front page story, entitled Meet the world s new boomers 1 contained this little gem:
Back in 2006, [PricewaterhouseCoopers] made some forecasts about what the global economy might look like in 2050, and it has now updated the predictions in the light of the financial crisis and its aftermath.
Delightful. They made some forecasts about what the global economy might look like. Given that they clearly didn t include any impact of the GFC in their forecasts, it clearly wasn t a particularly accurate forecast. Y know what an inaccurate prediction is called? Guesswork. Let s call a spade a spade here. I see this all the time, and it s starting to shit me. People making predictions and forecasts and projections hither and yon, and they re almost always complete bollocks, and they never get called on it. I read the Greater Fool blog now and then, and that blog is chock full of examples of people making predictions which have very little chance of being in any way accurate. While Dr Ben Goldacre and others are making inroads into requiring full disclosure in clinical trials, I m not aware of anyone taking a similar stand against charlatans making dodgy-as-hell predictions over and over again, with the sole purpose of getting attention, without any responsibility for the accuracy of those predictions. Is anyone aware of anyone doing work in this area, or do I need to register badpredictions.net and start calling out dodginess?

3 January 2013

Matthew Palmer: Everything's Better With Cats

The ancient Egyptians were a pretty cool bunch, but their worship of cats really added something to their civilisation (double bonus: their word for cat was mau ). The Internet itself, while undeniably a fantastic resource, reached new heights with the introduction of LOLCats. If you are cat-poor, you can swap your shabby tat for a tabby cat, while if you ve gone a bit overboard you can sell your excess cats to cat converters. However, cats have found minimal employment in systems administration. Until now. As the day job have been early adopters of btrfs, everyone at work has been very interested in the reported hash DoS of btrfs. It has been a topic of considerable discussion around the office. However, it can be a tough topic to explain to people less well versed in the arcana of computer science. Not to be deterred, Barney, our tech writer, took the standard explanation, added some cats, and came up with an explanation of the btrfs hash DoS that your parents can understand. The density of cat-related puns is impressive. (Incidentally, if you don t need cats to understand btrfs hash DoS attacks, and live in the Sydney area, you might be interested in working for Anchor as a sysadmin).

26 October 2012

Matthew Palmer: The e-mail PDA

I ve been a wannabe GTD afficionado for some years. I ve wanted to do it, but managing lists has always been something that has too much friction, overhead, or whatever. Finally, though, I think I might have found a way to manage lists that works. My use-case isn t unique, although I will concede I m perhaps being more dogmatic than most. I want something that: My previous attempt was a tool I called tagnote it was a vim-outliner file full of hierarchically organised outliner entries, with tags inlined. It was a neat idea, but it wasn t smooth to add/browse/delete items, and didn t work with my phone at all (trying to use vim for any length of time on a bottom-of-the-range Android phone would kill me). The current iteration, as the title of this post suggests, is a list manager that entirely uses e-mail. It really is a perfect symbiosis: So what have I got, exactly? It s fairly straightforward: That last point is the one I m really happy I achieved. I ve always been a fan of hide it until you need it , but my previous system didn t let me do that. Now, though, I have a separate list called tickler, and all the items in there have an X-Tickle header, which specifies the date I want to see them. Each night a cronjob runs through the tickler and moves anything for today into the INBOX. An X-Tickle-Repeat header lets me have things that repeat over and over again. So in short, using entirely open-source tools and a couple of hours of my time doing things I enjoy anyway (shell scripts! woo!), I ve now got a list manager that doesn t get in my way more than it absolutely has to. We ll see how long I last this time before I feel the urge to improve my lists again.

3 September 2012

Philipp Kern: IPv6 support in debian-installer

I tried to continue netcfg's journey to support IPv6 in debian-installer. Matt Palmer wrote a large patch set many moons ago (kudos!) and Colin Watson polished it and included it into Ubuntu. The reason for it not being merged into Debian first was ifupdown. Version 0.7 finally supports IPv6 and only entered Debian unstable by the end of May. Due to a bunch of changes to netcfg, one reason being a Summer of Code project on it, I had to forward-port those 50 patches and add two bug fixes on top. Hence it's possible that I introduced some breakage, even if the result works well for me (apart from one DHCPv6 oddity that cannot be seen from within KVM). The tree can be found in the people/pkern/ipv6 branch. I was unable to check if Ubuntu has introduced any additional patches on top of the old patch set, as I'm not used enough to bzr.

This mini.iso (detached signature) netinst image contains a debian-installer current as of today and two additional patched udebs: netcfg with the aforementioned IPv6 patch set and busybox with ping6 enabled. I'd appreciate if you could go and test it in one (or more) of the following environments: IPv4-only (DHCPv4 / static), wireless (IPv4 / IPv6 as desired, there has been some refactoring), stateless IPv6 autoconfiguration (it even supports RDNSS and DNSSL), stateless IPv6 autoconfiguration with stateless DHCPv6, and stateful DHCPv6. It will try to configure dual stack if possible. Please note that many Debian mirrors are not yet IPv6-enabled. Even prominent ftp.CC.debian.org hosts like ftp.de.debian.org still do not support it. So you'll have to pick one carefully if you want to try IPv6-only installation. (Which worked for me!)

If you have any feedback for me, please leave it here as a comment or mail me at pkern@d.o. Thanks!

22 July 2012

Matthew Palmer: Podcasting protip

Don t spend the first two minutes of the first episode of your podcast telling everyone what a micropodcast is, and how iTunes only lets you have 20MB per episode. The only exception to this might be if you were making a podcast about podcasting. Which I wasn t. That is all.

19 July 2012

Matthew Palmer: Melt your cores with Rake's multitask

Reading documentation does pay off. Browsing through the Rakefile format documentation for Rake, just now, I found mention of the multitask method which declares that all of that task s prerequisites can be executed in parallel. A comparison run:
$ rake clean; time rake build
real    0m7.116s
user    0m6.788s
sys     0m0.260s
$rake clean; time rake multibuild
real    0m3.820s
user    0m8.809s
sys     0m0.288s
This is a trivially small build I m doing, I must admit, but halving the build time (in this case at least) pays huge dividends in my perceived productivity. It really blows the dust out of my CPU cores, too, which tend to be woefully underutilised (being this is a quad-core laptop and all). So I say unto you all: go forth and multitask!

23 May 2012

Matthew Palmer: Unhealthy Obsessions

This morning, I nearly missed getting off at my train station to go to work because I was too engrossed in writing documentation for the company s leave accounting system. I wonder if there s a 12-step program for people like me. It can t be healthy.

13 May 2012

Matthew Palmer: vmdksync helps you escape from VMware

When I wrote lvmsync late last year, I didn t realise I was being typecast. Before too long, I realised that the logic that I d implemented for lvmsync would also help me with a separate migration project I d been dreading getting the day job off VMware. Back in the early days of virtualisation, management made the decision to run VMware, for all the usual reasons ( commercially supported! , industry standard! , and so on). Unsurprisingly (to me, anyway) it didn t take too long for management to realise that it wasn t the best choice for us. When you ve got umpty-billion dollars to spend on hardware, software, and support, VMware might be the right option (although Amazon doesn t seem to think so). Anchor s company culture, on the other hand, is build around smart staff, simple systems over dumb staff, smart vendors , because no vendor is ever going to care about our customers as much as we do. So VMware was never going to work for us. Unfortunately, as happens all too often, once VMware was in place, there was very little motivation to get rid of it and move those customers onto the chosen replacement (that we were deploying all new customers on). I happen to think this is a terrible attitude in general one that makes life so much harder in the long term. I believe strongly in retrofitting old systems to keep them up-to-date with the current state of the art, and keeping technical debt under control. But, I wasn t running the show back when we stopped putting new customers on VMware, so the few VMware servers we had stayed around far longer than they should have. Recently, though, bad things started to happen. The VMware servers were starting to fall apart. The Windows machine we had to keep around to use the VMware management console started crapping out, and when the choice was between doing unspeakable things to Windows, and just ditching VMware well, it wasn t much of a choice. The only remaining question was how to do the migration off VMware with the least amount of downtime to our customers. I was really quite surprised that nobody out in Internet land appeared to have come up with a simple, robust tool to do this. Sure, some vendors had all-singing, all-dancing toolkits that cost ridiculous amounts of money, required you to install their agent on the machine involved, and promised the earth, but it all smelt of snakeoil and bullshit. In true hacker style, then, I decided to write something myself. The model I came up with mirrored lvmsync s quite closely because that one worked, and it turned out to be surprisingly easy to implement once I managed to reverse-engineer the file format (VMware has a PDF spec of a bunch of it s file formats, but whoever wrote it was enough of an evil genius to make it utterly incomprehensible to anyone who doesn t already know the file format, whilst making perfect sense to anyone who already does). The result: vmdksync. It is nothing but 80-odd lines of ruby whose sole purpose is to take a delta.vmdk file and write the changes that are stored in that file to a file or block device that is a copy of the flat.vmdk file that you can copy while the VM is still running (after you ve made a snapshot, of course). It helped me provide a painless migration path away from VMware, and I d be really pleased if it helped some other people do the same. Share and enjoy!

25 December 2011

Matthew Palmer: The Other Way...

Chris Siebenmann sez:
The profusion of network cables strung through doorways here demonstrates that two drops per sysadmin isn t anywhere near enough.
What I actually suspect it demonstrates is that Chris company hasn t learnt about the magic that is VLANs. All of the reasons he cites in the longer, explanatory blog post could be solved with VLANs. The only time you can t get away with one gigabit drop per office and an 8 port VLAN-capable switch is when you need high capacity, and given how many companies struggle by with wifi, I m going to guess that sustained gigabit-per-machine is not a common requirement. So, for Christmas, buy your colleages a bunch of gigabit VLAN capable switches, and you can avoid both the nightmare of not having enough network ports, and the more hideous tragedy of having to crawl around the roofspace and recable an entire office.

17 December 2011

Matthew Palmer: Rethtool: How I Learned to Stop Worrying and Love the ioctl

Damn those unshaven yaks I m trying to write a Nagios plugin for work that will comprehensively monitor network interfaces and make sure they re up, passing traffic, all those sorts of things. Of course, I m doing it all in Ruby, because that s how I roll. So, I need to Know Things about the interface. Everyone does that with ethtool. Right? Sure, if your eyeballs are parsing it. But have you ever tried to machine parse it? To put it as eloquently as possible:
# ethtool eth0
Settings for eth0:
 Supported ports: [ TP MII ]
 Supported link modes:   10baseT/Half 10baseT/Full 
                         100baseT/Half 100baseT/Full 
                         1000baseT/Half 1000baseT/Full 
 Supports auto-negotiation: Yes
 Advertised link modes:  10baseT/Half 10baseT/Full 
                         100baseT/Half 100baseT/Full 
                         1000baseT/Half 1000baseT/Full 
 Advertised pause frame use: No
 Advertised auto-negotiation: Yes
 Link partner advertised link modes:  10baseT/Half 10baseT/Full 
                                      100baseT/Half 100baseT/Full 
                                      1000baseT/Half 1000baseT/Full 
 Link partner advertised pause frame use: No
 Link partner advertised auto-negotiation: Yes
 Speed: 1000Mb/s
 Duplex: Full
 Port: MII
 PHYAD: 0
 Transceiver: internal
 Auto-negotiation: on
 Supports Wake-on: pumbg
 Wake-on: g
 Current message level: 0x00000033 (51)
 Link detected: yes
Parse that, bitch!
Or perhaps not. At any rate, I decided that it would be most advantageous if I went straight to the source and twiddle the ioctl until it did my bidding. And thus, about 5 hours later, was Rethtool born. Once I worked out a less-than-entirely-crackful way of dealing with C structs in Ruby (after a bit of digging around, I went with the appallingly-undocumented-but-sufficiently-featureful CStruct), and after I finally worked out I was passing the wrong damned struct to ioctl(SIOCETHTOOL) (speaking of appallingly-undocumented: fuck you, ioctl, and all your twisty-passages children), it was smooth sailing. So, if you re one of the eight or so people on earth who will ever need to get at the grubby internals of your network interfaces using Ruby (and can t do it via some sysfs magic), Rethtool is for you.

14 November 2011

Francois Marier: Ideal OpenSSL configuration for Apache and nginx

After recently reading a number of SSL/TLS-related articles, I decided to experiment and look for the ideal OpenSSL configuration for Apache (using mod_ssl since I haven't tried mod_gnutls yet) and nginx.

By "ideal" I mean that this configuration needs to be compatible with most user agents likely to interact with my website as well as being fast and secure.

Here is what I came up with for Apache:
SSLProtocol TLSv1
SSLHonorCipherOrder On
SSLCipherSuite RC4-SHA:HIGH:!kEDH
and for nginx:
ssl_protocols  TLSv1;
ssl_ciphers RC4-SHA:HIGH:!kEDH;
ssl_prefer_server_ciphers on;

Cipher and protocol selectionIn terms of choosing a cipher to use, this configuration does three things:

Testing toolsThe main tool I used while testing various configurations was the SSL labs online tool. The CipherFox extension for Firefox was also quite useful to quickly identify the selected cipher.

Of course, you'll want to make sure that your configuration works in common browsers, but you should also test with tools like wget, curl and httping. Many of the online monitoring services are based on these.

Other considerationsTo increase the performance and security of your connections, you should ensure that the following features are enabled:
Note: If you have different SSL-enabled name-based vhosts on the same IP address (using SNI), make sure that their SSL cipher and protocol settings are identical.

12 November 2011

Matthew Palmer: Misleading error messages from blktrace

If you ever get an error message from the blktrace tool that looks like this:
BLKTRACESETUP(2) /dev/dm-0 failed: 2/No such file or directory
Thread 3 failed open /sys/kernel/debug/block/(null)/trace3: 2/No such file or directory
Thread 2 failed open /sys/kernel/debug/block/(null)/trace2: 2/No such file or directory
Thread 0 failed open /sys/kernel/debug/block/(null)/trace0: 2/No such file or directory
Thread 1 failed open /sys/kernel/debug/block/(null)/trace1: 2/No such file or directory
FAILED to start thread on CPU 0: 1/Operation not permitted
FAILED to start thread on CPU 1: 1/Operation not permitted
FAILED to start thread on CPU 2: 1/Operation not permitted
FAILED to start thread on CPU 3: 1/Operation not permitted
Don t be alarmed your disk hasn t suddenly disappeared out from underneath you. In fact, it means quite the opposite of what No such file or directory might imply. In fact, it means that there is already a blktrace of that particular block device in progress, and you ll need to kill that one off before you can start another one. Thank $DEITY for the kernel source code it was the only hope I had of diagnosing this particular nit before I went completely bananas and smashed my keyboard into small pieces.

28 October 2011

Matthew Palmer: rsync for LVM-managed block devices

If you ve ever had to migrate a service to a new machine, you ve probably found rsync to be a godsend. It s ability to pre-sync most data while the service is still running, then perform the much quicker sync the new changes action after the service has been taken down is fantastic. For a long time, I ve wanted a similar tool for block devices. I ve managed ridiculous numbers of VMs in my time, almost all stored in LVM logical volumes, and migrating them between machines is a downtime hassle. You need to shutdown the VM, do a massive dd netcat, and then bring the machine back up. For a large disk, even over a fast local network, this can be quite an extended period of downtime. The naive implementation of a tool that was capable of doing a block-device rsync would be to checksum the contents of the device, possibly in blocks, and transfer only the blocks that have changed. Unfortunately, as network speeds approach disk I/O speeds, this becomes a pointless operation. Scanning 200GB of data and checksumming it still takes a fair amount of time in fact, it s often nearly as quick to just send all the data as it is to checksum it and then send the differences.1 No, a different approach is needed for block devices. We need something that keeps track of the blocks on disk that have changed since our initial sync, so that we can just transfer those changed blocks. As it turns out, keeping track of changed blocks is exactly what LVM snapshots do. They actually keep a copy of what was in the blocks before it changed, but we re not interested in that so much. No, what we want is the list of changed blocks, which is stored in a hash table on disk. All that was missing was a tool that read this hash table to get the list of blocks that had changed, then sent them over a network to another program that was listening for the changes and could write them into the right places on the destination. That tool now exists, and is called lvmsync. It is a slightly crufty chunk of ruby that, when given a local LV and a remote machine and block device, reads the snapshot metadata and transfers the changed blocks over an SSH connection it sets up. Be warned: at present, it s a pretty raw piece of code. It does nothing but the send updated blocks over the network , so you have to deal with the snapshot creation, initial sync, and so on. As time goes on, I m hoping to polish it and turn it into something Very Awesome. Patches Accepted , as the saying goes.
  1. rsync avoids a full-disk checksum because it cheats and uses file metadata (the last-modified time, or mtime of a file) to choose which files can be ignored. No such metadata is available for block devices (in the general case).

27 September 2011

Vincent Bernat: Speeding up SSL: enabling session reuse

Session reuse is one of the most important mechanism to improve SSL performance: by submitting an appropriate blob to the server, a client is able to trigger an abbreviated handshake, improving latency and computation time. There exists two distinct ways to achieve session reuse: session identifiers as described in RFC 5246 and session tickets as depicted in RFC 5077. A quick note: when I say SSL, unless I mention a version number, you should read TLS instead. Look at the article for TLS on Wikipedia for some background on this.

Theory of operation To establish a SSL connection, four messages need to be exchanged between client and server. With a latency of 50 ms, we have a 200 ms overhead to establish the connection (plus TCP handshake). Moreover, to share a common secret, both the client and the server needs to achieve some public-key cryptographic operations which are costly, computation-wise. SSL full handshake To avoid a full SSL handshake each time a client request a resource, it is possible for the client to request an abbreviated handshake, saving a complete round-trip (100 ms) and avoiding the costliest part of the full SSL handshake. SSL abbreviated handshake Two mechanisms can be used to accomplish an abbreviated handshake:
  1. When the server sends the Server Hello message, it can include a session identifier. The client should store it and present it in the Client Hello message of the next session. If the server finds the corresponding session in its cache and accepts to resume the session, it will send back the same session identifier and will continue with the abbreviated SSL handshake. Otherwise, it will issue a new session identifier and switch to a full handshake. This mechanism is detailed in RFC 5246. It is the most common mechanism because it exists since earlier versions of SSL.
  2. In the last exchange of a full SSL handshake, the server can include a New Session Ticket message (not represented in the handshake described in the picture) which will contain the complete session state (including the master secret negotiated between the client and the server and the cipher suite used). Therefore, this state is encrypted and integrity-protected by a key known only by the server. This opaque datum is known as a session ticket. The details lie in RFC 5077 which supersedes RFC 4507.
The ticket mechanism is a TLS extension. The client can advertise its support by sending an empty Session Ticket extension in the Client Hello message. The server will answer with an empty Session Ticket extension in its Server Hello message if it supports it. If one of them does not support this extension, they can fallback to the session identifier mechanism built into SSL. RFC 5077 identifies situations where tickets are desirable over session identifiers. The main improvement is to avoid to maintain a server-side session cache since the whole session state is remembered by the client, not the server. A session cache can be costly in term of memory and difficult to share between multiple hosts when requests are load-balanced across servers.

Browser support Session identifiers are part of SSL and are therefore supported since a long time by clients and servers. However, session tickets are an optional TLS extension and therefore, the support is not as widespread as for session identifiers. Support for tickets was added in OpenSSL 0.9.8f (October 2007). In GnuTLS, this is version 2.9.3 (August 2009). For NSS, the set of libraries behind most browsers, support of RFC 5077 is in version 3.12 (June 2008). Schannel, Microsoft implementation of TLS, does not currently supports tickets. To check if a browser supports session resume without and with tickets, I have written a web server listening to several ports with different configurations (session cache disabled or enabled, ticket support disabled or enabled). With the help of some Javascript, it is possible to interact with this server to determine what kind of session resume a browser support. Checking for browser support for session resume It is difficult to get extensive results this way since you need to install a lot of browsers to get something valuable. Surprisingly, Android 2.3.4 browser (as shipped by Cyanogen Mod) does not support any session resume. Another way to check if a browser supports session resume is to use a sniffer and to spot Client Hello messages. If a client tries to resume a session, it will use an appropriate session identifier or send a ticket. A simple program parsing PCAP files allows to automate this task and get some interesting statistics. Here are random facts from more than 300 000 requests (38 000 clients) to a customer care service of an important French telco:
  • 35% of requests exhibit support of tickets;
  • 53.3% of requests ask to resume without tickets; 25.6% with tickets;
  • 67.2% of requests exhibit support of SNI;
  • 86.7% of requests use TLS 1.0; the remaining uses SSL 3.0; almost no TLS 1.1/1.2;
  • in average, a client execute 8 SSL handshakes;
  • ciphers supported by all clients are 3DES-SHA, RC4-MD5 and RC4-SHA.
With some log analysis to match each requests with the appropriate user agent, we can split the world in two parts: browsers supporting RFC 5077 (Chrome and Firefox) and browsers not supporting RFC 5077 (Internet Explorer, Opera 9.80, Safari for MacOS and iOS and Android browser).

Web server support There is no unique recipe to configure a web server to handle appropriately and efficiently session resume. Here are some generic directions :
  • if the web server uses multiple processes, you need to configure a shared memory session cache; usually, tickets will work just fine in this setup;
  • if you use local load-balancing, one easy way to avoid any problem is to use ensure that one IP is mapped to one server (either statically with hashing or dynamically with a stick table); if your load-balancer understands SSL, you can also use session identifier (and disable tickets1) to build a stick table;
  • if you don t want to apply the previous advice, you need to share the session cache between the servers with something like memcached and disable tickets;
  • if you use global load-balancing (DNS based), there is no need to ensure that session resumption works across geographic pools since a user will usually stick to one pool.
Remember that only one client out of three supports TLS tickets. Therefore, you cannot rely on them to ensure proper session resume. You need a session cache.

Setting a session cache with Apache & nginx Apache features two different SSL engines. The first one is mod_ssl and uses OpenSSL as backend. It is possible to setup a shared session cache with SSLSessionCache but the memcached-enabled session cache is available only in the development branch. You can also get one by switching to mod_gnutls backend. The cache is enabled with GnuTLSCache. In this case, do not enable tickets because they cannot be shared. With nginx, you need to enable the shared session cache with ssl_session_cache. Currently, it lacks a memcached-enabled session cache but Matt Palmer has some patches to add it to the 0.8 branch. However, those patches can have a serious impact on nginx performance since retrieving a session from memcached is done synchronously (mostly because a limitation in OpenSSL design who does not allow to register asynchronous callbacks with SSL_CTX_sess_set_get_cb()). As a proof of concept, I have also added a similar feature to stud, the scalable TLS unwrapping daemon, a very efficient network proxy terminating SSL connections. The same limitation as for nginx applies: performances will suffer. Look at the pull request for more details.

Sharing tickets RFC 5077 tells tickets allow to load-balance requests across servers. However, to the best of my knowledge, there is currently no web server able to share tickets across a pool. Actually, when a server initializes its SSL stack, it will randomly generate some keys that will be used to protect and encrypt tickets. In OpenSSL, from ssl/ssl_lib.c:
    /* Setup RFC4507 ticket keys */
    if ((RAND_pseudo_bytes(ret->tlsext_tick_key_name, 16) <= 0)
              (RAND_bytes(ret->tlsext_tick_hmac_key, 16) <= 0)
              (RAND_bytes(ret->tlsext_tick_aes_key, 16) <= 0))
            ret->options  = SSL_OP_NO_TICKET;
Therefore, with a round-robin load-balanced pool of servers, tickets have to be disabled because servers would not accept tickets from neighbors. One way to solve this is to generate keys deterministically by hashing some common secret with the private key. I have implemented this approach for stud. Here is a simplified version (no error handling, no allocation) of the pull request for this feature:
    unsigned char keys[48];
    EVP_PKEY *pkey = grab_private_key();
    /* To get our key, we sign the seed with the private key */
    unsigned int siglen;
    unsigned char sign[LARGE_ENOUGH];
    EVP_MD_CTX mdctx;
    EVP_MD_CTX_init(&mdctx);
    EVP_SignInit(&mdctx, EVP_sha256());
    EVP_SignUpdate(&mdctx, some_secret, strlen(some_secret));
    EVP_SignFinal(&mdctx, sign, &siglen, pkey);
    /* And we keep only the first bytes. */
    memcpy(keys, sign, sizeof(keys));
    /* Tell OpenSSL to use those keys */
    SSL_CTX_set_tlsext_ticket_keys(ctx, keys, sizeof(keys));

Testing While tests can be done with just openssl s_client and openssl sess_id, testing all aspects can take a lot of time. For this purpose, I have written a client testing session resume with and without tickets. Here is a typical result: twitter.com RFC session resume The program will try to establish five consecutive SSL connections for each IP without and with tickets. For each IP, the last four SSL connections should always reuse the first SSL session. Here, Twitter web servers seem properly configured: for each IP, session resume is successful. Their web servers do not support tickets or they may have disabled it because they share the session cache between servers inside the same pool.

  1. RFC 5077 describes interactions between session identifiers and session tickets. When using tickets, the server should send back an empty session identifier. The client may present an empty session identifier or generate one. However, for all practical purposes, it seems that servers send back a non-empty session identifier and clients just stick to this identifier. Please, investigate if you want to use SSL session identifiers for load-balancing while keeping tickets enabled.

23 August 2011

Vincent Bernat: SSL termination: stunnel, nginx & stud

Here is the short version: to get better performance on your SSL terminator, use stud on 64bit system with patch from meric Brun for SSL session reuse with some AES cipher suite (128 or 256, does not really matter), without DHE, on as many cores as needed, a key size of 1024 bits unless more is needed.

Introduction A quick note: when I say SSL, you should read TLS v1 instead. Look at the article for TLS on Wikipedia for some background on this. One year ago, Adam Langley, from Google, stated SSL was not computationally expensive any more:
In January this year (2010), Gmail switched to using HTTPS for everything by default. Previously it had been introduced as an option, but now all of our users use HTTPS to secure their email between their browsers and Google, all the time. In order to do this we had to deploy no additional machines and no special hardware. On our production frontend machines, SSL/TLS accounts for less than 1% of the CPU load, less than 10KB of memory per connection and less than 2% of network overhead. Many people believe that SSL takes a lot of CPU time and we hope the above numbers (public for the first time) will help to dispel that. If you stop reading now you only need to remember one thing: SSL/TLS is not computationally expensive any more.
This is a very informative post containing tips on how to enhance SSL performance by reducing latency. However, unlike Gmail, you may still be worried by SSL raw performances. Maybe each of your frontends is able to serve 2000 requests per second and CPU overhead for SSL is therefore significative. Maybe you want to terminate SSL in your load balancer (even if you know this is not the best way to scale).

Tuning SSL There are a lot of knobs you can use to get better performances from SSL: choosing the best implementation, using more CPU cores, switching to 64bit system, choosing the right cipher suite and the appropriate key size and enabling a session cache. We will consider three SSL terminators. They all use OpenSSL behind the hood. stunnel is the oldest one and uses a threaded model. stud is a recent attempt to write a simple SSL terminator which is efficient and scalable. It uses the one-process-per-core model. nginx is a web server and it can be used as reverse proxy and therefore act as SSL terminator. It is known to be one of the most efficient web server, hence the choice here. It also features built-in basic load balancing. Since stud and stunnel does not have this feature, we use them with HAProxy, an high performance load-balancer that usually defers the SSL part to stunnel (but stud can act as a drop-in replacement here). Those implementations were already tested by Matt Stancliff. He first concluded nginx sucked at SSL then that nginx did not suck at SSL (in the first case, nginx was the only one to select a DHE cipher suite). The details were very scarce. I hope to bring here more details. This brings the importance of the selected cipher suite. The client and the server must agree on the cipher suite to use for symmetric encryption. The server will select the strongest one it supports from the list proposed by the client. If you enable some expensive cipher suite on your server, it is likely to be selected. The last important knob is SSL session reuse. It matters for both performance and latency. During the first handshake, the server will send a session ID. The client can use this session ID to request an abbreviated handshake. This handshake is shorter (latency improvement by one round-trip) and allows to reuse the previously negociated master secret (performance improvement by skipping the costliest part of the handshake). See the article from Adam Langley for additional information (with helpful drawings). The following table shows what cipher suite is selected by some major web sites. I have also indicated wheter they support session reuse. See this post from Matt Palmer to know how to check this.
Site Cipher Session reuse?
www.google.com RC4-SHA
www.facebook.com RC4-MD5
twitter.com AES256-SHA
windowsupdate.microsoft.com RC4-MD5
www.paypal.com AES256-SHA
www.cmcicpaiement.fr DHE-RSA-AES256-SHA
RFC 5077 defines another mechanism for session resumption that does not require any server-side state. I did not investigate much how this works but it does not rely on the session ID.

Benchmarks All those benchmarks have been run on a HP DL 380 G7, with two Xeon L5630 (running at 2.13GHz for a total of 8 cores), without hyperthreading, using a 2.6.39 kernel (HZ is set to 250) and two Intel 82576 NIC. Linux conntrack subsystem has been disabled and file limits have been raised over 100,000. A Spirent Avalanche 2900 appliance is used to run the benchmarks. We are interested in raw SSL performances in term of handshakes per second. Because we want to observe the effects of session reuse, the scenario runs by most tests lets clients doing four successive requests and reusing the SSL session for three of them. There is no HTTP keepalive. There is no compression. The size of the HTTP answer from the server is 1024 bytes. Those choices are done because our primary target is the number of SSL handshakes per second. It should also be noted that HAProxy alone is able to handle 22,000 TPS on one core. During the tests, it was never the bottleneck.

Implementation We run a first bench to compare nginx, stud and stunnel on one core. This bench runs on a 32bit system, with AES128-SHA1 cipher suite and a 1024bit key size. Here is the result for stud (version 0.1, with meric Brun s patch for SSL session reuse): stud, 1 CPU The most important plot is the top one. The blue line is the attempted number of transactions per second (TPS) while the green one is the successful number of TPS. When the two lines start to diverge, this means we reach some kind of maximum: the number of unsuccessful TPS also starts to raise (the red line). There are several noticeable points: the maximum TPS (788 TPS), the maximum TPS with an average response time less than 100 ms (766 TPS), less than 500 ms (776 TPS) and the maximum TPS with less than 0.01% of packet loss (783 TPS). Let s have a look at nginx (version 1.0.5): nginx, 1 CPU It is able to get the same performance as stud (763 TPS). However, over 512 TPS, you get more than 100 ms of response time. Over 556 TPS, you even get more than 500 ms! I get this behaviour in every bench with nginx (with or without proxy_buffering, with or without tcp_nopush, with or without tcp_nodelay) and I am unable to explain it. Maybe there is something wrong in the configuration. After hitting this limit, nginx starts to process connections by bursts. Therefore, the number of successful TPS is plotted using a dotted line while the moving average over 20 seconds is plotted with a solid line. On the next plot, you can see the compared performance of the three implementations. On this kind of plot, the number of TPS that we keep is the maximum number of TPS where loss is less than 0.1% and average response time is less than 100 ms. stud achieves a performance of 766 TPS while nginx and stunnel are just above 500 TPS. stunnel vs stud vs nginx, 1 CPU

Number of cores With multiple cores, we can affect several of them to do SSL hard work. To get better performance, we pin each processes to a dedicated CPU. Here is the repartition used during the tests:
1 core 2 cores 4 cores 6 cores
CPU 1, Core 1 network network network network + haproxy
CPU 1, Core 2 haproxy haproxy haproxy SSL
CPU 1, Core 3 SSL SSL SSL SSL
CPU 1, Core 4 - SSL SSL SSL
CPU 2, Core 5 - - network SSL
CPU 2, Core 6 - - SSL SSL
CPU 2, Core 7 - - SSL SSL
CPU 2, Core 8 system system system system + haproxy
Remember, we have two CPU, 4 cores on each CPU. Cores on the same CPU share the same L2 cache (on this model). Therefore, the arrangement inside a CPU is not really important. When possible, we try to keep things together on the same CPU. The SSL processes always get exclusive use of a core since they will always be the most busy. The repartition is done with cpuset(7) for userland processes and by setting smp_affinity for network card interrupts. We keep one core for the system. It is quite important to be able to connect and monitor the system correctly even when it is loaded. With so many cores, we can afford to reserve one core for this usage. Beware! There is a trap when pining processes or IRQ to a core. You need to check /proc/cpuinfo to discover the mapping between kernel processors and physical cores. Sometimes, second kernel processor can be the first core of the second physical processor (instead of the second core of the first physical processor). stunnel vs stud vs nginx, 6 CPU As you can see in the plot above, only stud is able to scale properly. stunnel is not able to take advantage of the cores and its performances are worse than with a single core. I think this may be due to its threaded model and the fact that the userland for this 32bit system is a bit old. nginx is able to achieve the same TPS than stud but is disqualified because of the increased latency. We won t use stunnel in the remaining tests.

32bit vs 64bit The easiest way to achieve a performance boost is to switch to a 64bit system. TPS are doubled. On our 64bit system, thanks to the use of a version of OpenSSL with the appropriate support, AES-NI, a CPU extension to improve the speed of AES encryption/decryption, is enabled. Remaining tests are done with a 64bit system. stud vs nginx, 64bit

Ciphers and key sizes In our tests, the influence of ciphers is minimal. There is almost no difference between AES256 and AES128. Using RC4 adds some latency. The use of AES-NI may have helped to avoid this latency for AES. On the other hand, using a 2048bit key size has a huge performance hit. TPS are divided by 5. stud: cipher suites and key sizes

Session cache The last thing to check is the influence of SSL session reuse. Since we are on a local network, we see only one of the positive effects of session reuse: better TPS because of reduced CPU usages. If there was a latency penalty, we should also have seen better TPS thanks to the removed round-trip. stud: session reuse This plot also explains why stud performances fall after the maximum: because of the failed transactions, the session cache is not as efficient. This phenomenon does not exist when session reuse is not enabled.

Conclusion Here is a summary of the maximum TPS reached during the benchmark (with an average response time below 100ms). Our control use case is the following: 64bit, 6 cores, AES128, SHA1, 1024bit key, 4 requests into the same SSL session.
Context nginx 1.0.5 stunnel 4.41 stud 0.1 (patched)
1 core, 32bit 512 TPS 503 TPS 766 TPS
2 cores, 32bit 599 TPS - -
32bit 804 TPS 501 TPS 4251 TPS
- 1799 TPS - 9000 TPS
AES256 - - 8880 TPS
RC4-MD5 - - 7370 TPS
2048bit - - 1643 TPS
no session reuse - - 5844 TPS
80 requests per SSL session - - 10797 TPS
Therefore, stud in our control scenario is able to sustain 1500 TPS per core. It seems to be the best current option for a SSL termination. It is not available in Debian yet, but I intend to package it. Here is how stud is invoked:
# ulimit -n 100000
# stud  -n 2 -f 172.31.200.15,443 -b 127.0.0.1,80 -c ALL \
>   -B 1000 -C 20000 --write-proxy \
>   =(cat server1024.crt server1024.key dhe1024)
You may also want to look at HAProxy configuration, nginx configuration and stunnel configuration.

Matthew Palmer: UPSes in Datacentres

(This was going to be a comment on this blog post, but it s a Turdpress site that wants JS and cookies to comment. Bugger that for a game of skittles. Rimuhosting s recent extended outage due to power problems was apparently caused by a transfer switch failure at their colo provider. This has led people to wonder if putting UPSes in individual racks is a wise move. The theory is that in the event of a small outage, the UPS can keep things humming, and in an extended outage you can gracefully shut things down rather than having a hard thump. I happen to think this theory is bunkum. Your UPS is a newly instituted single point of failure. I d be willing to bet that the cost of purchasing, installing, and maintaining the UPSes, as well as the cost of the outages that inevitably result from their occasional failure, would be far greater than the cost of the occasional power outage you get in a well-managed facility. Good facilities don t have small outages. They don t have squirrels in the roof cavities, and they don t have people dropping spanners across busbars. The only outages they have are the big ones, when some piece of overengineered equipment turns out to be not so overengineered the multi-hour (or multi-day) ones where your UPS isn t going to stop you from going down. Your SLA credit and customer goodwill is already toast, so all you re saving is the incremental cost of a little bit more downtime while you get fscks run. If you want the best possible power reliability, get yourself into a really well engineered facility, and run dual-power on everything. Definitely run the numbers before you go down the UPS road; I ll bet you find they re not worth it.

Matthew Palmer: Oh HP, you Bucket of Fail

I recently got given a new printer, a HP LaserJet Professional 1 P1102w. It s fairly loudly touted on HP s website that this printer has Full support under Linux. And yet, it won t work with my Linux-based print server. Why? Because it uses a proprietary driver plugin, and that plugin is only available for x86 and amd64, and my print server is ARM-based. Well done, HP. You ve managed to revive the old all the world s a VAX philosophy, on an OS that is more than capable of running on practically anything. You got that for free. Why do you insist on screwing with it? As an added bonus, when I try to Ask a Question on the HPLIP website, to politely (ha!) inquire as to the possibility of an ARM binary, I get sent to Launchpad, which does nothing more than tell me that there is an Invalid OpenID transaction . That s the entire content of the page. Useful. Lies, damned lies, and a double helping of proprietary software fail. My day is complete.
  1. I use scarequotes around Professional because, as far as I can tell, this is just an entry-level personal laser printer. There is nothing particularly professional about it.

21 August 2011

Matthew Palmer: Unintended Consequences: Why Evidence Matters

If you were trying to get rid of hiring discrimination (on grounds irrelevant to the ability to do the job), you d think a good way to do it would be to reduce the ability of the hiring manager to discriminate, by restricting their access to irrelevant (but possibly prejudicial) information. It s certainly what I might come up with as an early idea in a brainstorming session. I m not alone: France had this same idea, and gave it a go, by passing a law requiring companies to anonymise resumes before they got to any decision makers. So far, so average. But rather than just coming up with an idea and inflicting it on everyone by a blanket law, they did what should be done with all new ideas: they trialled it (with 50 large corporations, according to the report) before making it universal, to make sure that the theory matched reality. Then, after giving it a good shake, they examined the evidence, and found that the idea had some unintended consequences:
Applicants with foreign names, or who lived in under privileged areas were found to be less likely to be called in for an interview without the listing of their name and address. Researchers reasoned that this was because employers and recruiters made allowances for subpar presentation or limited French speaking if their performance could be explained by deprivation or foreign birth.
The icing on the cake is that now the evidence is in, they re now planning on making it optional (I m not sure how that s different from killing it entirely, but I guess it s worth the same in the end). So we ve got the quinella of decision-making awesome: Far too often, we get far too attached to our ideas, and don t let them go when reality doesn t fit our preconceptions. Kudos to the people involved in this idea for not letting their egos get in the way of good government. Let it be an object lesson for us all.

19 August 2011

Matthew Palmer: Stream of Consciousness

This forum post on requiring formal letters of resignation made me smile:
HR does silly stuff like this all the time. Somebody s following some policy that was created because somebody verbally resigned nine years ago and then wanted to come back and some executive said where s their letter and HR said we don t have one and the exec said that s not good and we oughta not be doing stuff to help people leave unless they re really leaving and HR said okay we ll have a policy and the exec said that s good. And the exec s not there anymore.
I ll leave everyone to make their own conclusions as to why I was reading that particular thread.

Next.

Previous.